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Abstract 

Multiple hypotheses testing is a core problem in statistical inference and arises in almost 
every scientific field. Given a sequence of null hypotheses 7t(n) = (-Hi,... , .H n ), Benjamini and 
Hochberg [BH95] introduced the false discovery rate (FDR), which is the expected proportion of 
false positives among rejected null hypotheses, and proposed a testing procedure that controls 
FDR below a pre-assigned significance level. They also proposed a different criterion, called 
mFDR, which does not control a property of the realized set of tests; rather it controls the ratio 
of expected number of false discoveries to the expected number of discoveries. 

In this paper, we propose two procedures for multiple hypotheses testing that we will call 
Lond and Lord . These procedures control FDR and mFDR in an online manner. Concretely, 
we consider an ordered possibly infinite- sequence of null hypotheses H = (_H 1; H 2 , H 3 ,...) 
where, at each step i, the statistician must decide whether to reject hypothesis Hi having access 
only to the previous decisions. To the best of our knowledge, our work is the first that controls 
FDR in this setting. This model was introduced by Foster and Stine [FS07] whose alpha-investing 
rule only controls mFDR in online manner. 

In order to compare different procedures, we develop lower bounds on the total discovery rate 
under the mixture model where each null hypothesis is truly false with probability e, for a fixed 
arbitrary e, independently of others. Conditional on the set of true null hypotheses, p-values 
are independent, and iid according to some non-uniform distribution for the non-null hypotheses. 
Under this model, we prove that both Lond and Lord have nearly linear number of discoveries. 
We further propose an adjustment to Lond to address arbitrary correlation among the p-values. 

Finally, we evaluate the performance of our procedures on both synthetic and real data com¬ 
paring them with alpha-investing rule, Benjamin-Hochberg method and a Bonferroni procedure. 


1 Introduction 

The common practice in claiming a scientific discovery is to support such claim with a p -value as a 
measure of statistical significance. Hypotheses with p-values below a significance level a , typically 
0.05, are considered to be statistically significant. While this ritual controls type I errors for single 
testing problems, in case of testing multiple hypotheses we need to adjust the significance levels to 
control other metrics such as family-wise error rate (FWER) or false discovery rate (FDR). 
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As a concrete example, consider a microarray experiment that studies the changes in genetic 
expression levels of thousands of genes between a set of normal control samples and a set of prostate 
cancer patients. The goal is to identify a subset of genes that have association with the prostate 
cancer. This can be formulated as a multiple hypotheses testing problem with many hypotheses— 
say p- where a few of them - say s- are non null. Indeed, we expect only a small number of genes to 
be relevant to the cancer. In such setting, an unguarded use of single-inference procedures leads to a 
large false positive (false discovery) rate. In particular, if we consider a fixed significance level a for 
all the tests, each of p — s truly null hypotheses can be falsely rejected with probability a. Therefore, 
we get a(p — s ) wrong findings in expectation. This situation becomes more dramatic when more 
genes are tested over time and p/s —> oo. 

Let us stress two challenges that arise with increasing frequency in modern data-analysis prob¬ 
lems: 

I. The number of hypotheses is unknown or potentially infinite. This is especially the case when 
a given line of research engages numerous teams across the world. Hypotheses are generated 
over time by different researchers and tested without central control. If each test is carried out 
without taking into account previous discoveries, this will generate a constant stream of false 
discoveries. If the underlying number of true facts is bounded, false discoveries will over-run 
true ones over time. 

In the prostate cancer example, more genes factors (genetic and environmental) will be tested 
for having significant association with cancer. If previous discoveries are not taken into account, 
this will generate a constant stream of false associations. 

II. Scientific research is decentralized. An easy solution to the previous problem could be obtained 
by coordinating research on a given topic through a central control. In our running example, 
there could be a center that coordinates research on prostate cancer. After accumulating all 
raw experimental data on the issue, and all hypotheses (e.g. all conjectured associations) the 
center carries out the data analysis. For instance, it performs a multiple hypotheses test, 
controlling FDR. 

Of course, the very decentralized nature of scientific research prevents such a solution. We 
instead seek a solution by which at each step the statistician can decide whether to reject 
the current hypothesis on the basis of the current evidence, and minimal information about 
previous hypotheses. 

These remarks motivate the following setting, first introduced in [FS07] (a more formal definition 
will be provided below). 

Hypotheses arrive sequentially in a stream. At each step, the investigator must decide 
whether to reject the current null hypothesis without having access to the number of hy¬ 
potheses (potentially infinite) or the future p-values, but solely based on the previous 
decisions. 

In order to illustrate this scenario, consider an approach that would control FWER, i.e. the prob¬ 
ability of rejecting at least one true null hypothesis. This can be achieved by choosing different 
significance levels a* for tests Hi, with a = (a«)j> 1 summable, e.g., on = a2~ l . Notice that the 
researcher only needs to know the number of tests performed before the current one, in order to 
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implement this scheme. However, this leads to small statistical power. In particular, obtaining a 
discovery at later steps becomes very unlikely. 

Since Benjamini and Hochberg’s seminal paper [BH95], FDR has been widely used in multiple 
testing problems and nowadays serves as the acceptable criterion to reduce risk of spurious discoveries. 
FDR controls the proportion of false rejections rather than the probability of at least one rejection. 
This metric is particularly useful when there is no strong interest in any single hypotheses, but 
instead we would like to find a set of potential predictors. Benjamini and Hochberg also proposed 
a sequential testing procedure, referred to as BH hereafter, to control FDR in multiple testing 
problems assuming that all the p-values are given a priori. Let us briefly recall the BH procedure. 
Given p -values pi,P 2 , ■ ■ ■ ,Pn and a significance level a , follow the steps below: 

1. Let pu\ be the ith p -value in the (increasing) sorted order, and define p( 0 ) = 0. Further, let 

zbh = max < 0 < i < n : p^ < ai/n> . (1) 


2. Reject Hj for every test where pj < Pi (nH) ■ 

Note that BH requires the knowledge of “all” p-values to determine the significance level for testing 
the hypotheses. Hence, it does not address the scenario described above. 

In this paper, we propose a method for online control of false discovery rate. Namely, we consider 
a sequence of hypotheses Hi, H 2 , H $,... that arrive sequentially in a stream, with corresponding p- 
values p\ , P 2 , •••• We aim at developing a testing mechanism that ensures false discovery rate 
remains below a pre-assigned level a. A testing procedure provides a sequence of significance levels 
ccj, with decision rule: 

T = if IH < &i (reject H) 

1 0, otherwise (accept Hi) 

In online testing, we require significance levels to be functions of prior outcomes: 

a i = f({Ti,T 2 ,...,T i _i}). (3) 


One further motivation for online hypothesis testing is that it allows to exploit domain knowledge 
in a more flexible manner. In standard multiple hypotheses testing, domain expertise can be used to 
choose the collection of (null) hypotheses that are likely to be rejected. However, after hypotheses 
are chosen there is no exploitation of domain knowledge. In contrast, in an online framework, domain 
knowledge can be used to order the hypotheses to increase statistical power. Our proposed methods 
have the property that if the hypotheses that are more likely to be rejected appear first, or if they 
arrive in batches, we gain larger statistical power and higher discovery rate. 

Foster and Stein [FS07] proposed the alpha-investing method that controls a modified measure, 
called mFDR, in online multiple hypothesis testing (under some technical assumptions). mFDR 
is the ratio of expected number of false discoveries to the expected number of discoveries. Alpha¬ 
investing starts with an initial wealth, at most a, of allowable mFDR rate. The wealth is spent for 
testing different hypotheses. Each time a discovery occurs, the alpha-investing procedure earns a 
contribution toward its wealth to use for further tests. We refer to Section 3 for a more detailed 
discussion on alpha-investing procedure and comparison with our proposals. 

The contributions of this paper are summarized as follows. 
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Online control of FDR and mFDR. We present two algorithms, that we call Lond and Lord , 
to control FDR and mFDR in an online fashion. The Lond algorithm sets the significance 
levels o.i based on the number of discoveries made so far, while Lord sets them according to 
the time of the most recent discovery. To the best of our knowledge, this is the first work that 
guarantees online control of false discovery rate (FDR). We further propose an adjustment to 
Lond to cope with arbitrary dependency among the p-values. Note that in many scenarios, 
the investigator chooses the hypotheses based on the previous decisions and hence the test 
statistics and the resulting p-values are dependent. 

Computing total discovery rate. In order to compare different procedures, we develop lower 
bounds on the total discovery rate of our methods under the mixture model where each null 
hypotheses is truly false with probability e, for a fixed arbitrary e, independently of other 
hypotheses. Conditional on the set of true null hypotheses, p-values are independent with 
uniform distribution for null hypotheses and are iid according to some non-uniform distribution 
for the non-null hypotheses. Under this model, we show that both Lond and Lord achieve a 
nearly linear number of discoveries. 

Numerical Validation. We validate our procedures on synthetic and real data in Section 6, show¬ 
ing that they control FDR and mFDR in an online setting. We further compare them with 
the alpha-investing method [FS07], and with BH and Bonferroni procedures. We observe that, 
our online procedures are nearly as powerful as BH, with often much smaller FDR. 

In addition, we corroborate our results regarding the total discovery rate. 

In the rest of the introduction, we provide definitions of FWER, FDR and mFD, and discuss 
related work. In Section 2, we present our procedures, Lord and Lond , that control FDR and mFDR 
in an online manner. Section 3 describes alpha-investing rules proposed by [FS07] for controlling 
mFDR and explains the differences between these rules and our procedures. We discuss in Section 4.1 
how Lord and Lond leverage the domain knowledge to achieve higher statistical power. In Section 5, 
we compute discovery rate of Lord and Lond algorithms under the mixture model, showing their 
order optimality. We evaluate performance of Lord , Lond , Bonferroni, BH and an alpha-investing 
rule on synthetic examples in Section 6. Proof of main theorems and lemmas are provided in Section 7, 
with several technical steps deferred to Appendices. 

1.1 Different criteria: FWER, FDR and mFDR 

Consider an ordered sequence of null hypotheses H = (R*)i>i, where Hi concerns the value of a 
parameter 0j. Without loss of generality, assume that Hi = {0* = 0}. Rejecting null hypothesis 
Hi means that 6i is discovered to be significant. Let 0 denote the set of possible values for the 
parameters. We further let pi be the p- value of test Hi whose distribution depends on the value 0*. 
Under the null hypothesis Hi : 0,; = 0, the corresponding p- value is uniformly random in [0,1]: 

Pi~W([0,l]). 

We let H(n) = (Hi,..., H n ) be the collection of the first n hypotheses in the stream. The statistic 
Ti is the indicator that a discovery occurs at time i and D(n) denotes the number of discoveries in 
H(n), hence 

n 

D(n) = Y^ T i- 

i—1 
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We also let the random variable Ff be the indicator that a false discovery occurs at time step i and 
V e (n) be the number of false discoveries in FL{n ), i.e., the number of hypotheses that are incorrectly 
rejected. Therefore, 

n 

V e {n) = Y J F°. 

1=1 

Throughout the paper, superscript 9 is used to distinguish unobservable variables such as V d (n ), 
from statistics such as D(n). However, we drop the superscript when it is clear from the context. 
There are various criteria relevant for multiple testing problem. 


• Family-wise error rate (FWER): The probability of falsely rejecting any of the null hypotheses 
in FUji): 


FWER(n) = sup Eg 
flee 


V e (n) > lj 


(4) 


• False discovery rate (FDR): This criterion was introduced by Benjamini and Hochberg [BH95], 
and is the expected proportion of false discoveries among the rejected hypotheses. We first 
define false discovery proportion (FDP) as follows. For n > 1, 


FDP 0 (n) 


V e {n) 
D(n ) V 1 ' 


The false discovery rate is defined as 


FDR(n) = supEflfFDP 61 ^)^ . 

flee v ' 


(5) 


m-False discovery rate (mFDR): The ratio of expected number of false rejections to the ex¬ 
pected number of rejections: 


mFDR^m) = sup 


E e(V d (n)) 


flee E 0 (D(n)) + r,- 


( 6 ) 


Note that while FDR controls a property of the realized set of tests, mFDR is the ratio of two 
expectations over many realizations. In general the gap between these metrics can be significant. 
(See Figures 5(a) and 5(b).) 


1.2 Further related work 

We list below a few lines of research that are related to our work. 

General context. An increasing effort was devoted to reducing the risk of fallacious research findings. 
Some of the prevalent issues such as publication bias, lack of replicability and multiple comparisons 
on a dataset were discussed in Ioannidis’s 2005 papers [Ioa05b, Ioa05a] and in [PSA11], 

Statistical databases. Concerned with the above issues and the importance of data sharing in the 
genetics community, [RAN14] proposed an approach to public database management, called Quality 
Preserving Database (QPD). The premise of QPD is to make a shared data resource amenable to 
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perpetual use for hypothesis testing while controlling FWER and maintaining statistical power of 
the tests. In this scheme, for testing a new hypothesis, the investigator should pay a price in form of 
additional samples that should be added to the database. The number of required samples for each 
test depends on the required effect size and the power for the corresponding test. A key feature of 
QPD is that controlling type I error is performed at the management layer and the investigator is 
not concerned with p-values for the tests. Instead, investigators provide effect size, assumptions on 
the distribution of the data, and the desired statistical power. A critical limitation of QPD is that 
all samples, including those currently in the database and those that will be added, are assumed to 
have the same quality and are coming from a common underlying distribution. Motivated by similar 
concerns in practical data analysis, [DFH + 14] applies insights from differential privacy to efficiently 
use samples to answer adaptively chosen estimation queries. These papers however do not address 
the problem of controlling FDR in online multiple testing. 

Online feature selection. Building upon alpha-investing procedures, [LFU11] develops VIF, a method 
for feature selection in large regression problems. VIF is accurate and computationally very efficient; 
it uses a one-pass search over the pool of features and applies alpha-investing to test each feature for 
adding to the model. VIF regression avoids overfitting leveraging the property that alpha-investing 
controls mFDR. Similarly, one can incorporate Lord and Lond procedures in VIF regression to 
perform fast online feature selection and provably avoid overfitting. 

High-dimensional and sparse regression. There has been significant interest over the last two years in 
developing hypothesis testing procedures for high-dimensional regression, especially in conjunction 
with sparsity-seeking methods. Procedures for computing p- values of low-dimensional coordinates 
were developed in [ZZ14, VdGBR + 14, JM14a, JM14b, JM13]. Sequential and selective inference 
methods were proposed in [LTTT14, FST14, TLTT14], Methods to control FDR were put forward 
in [BC14, BBS+14], 

As exemplified by VIF regression, online hypothesis testing methods can be useful in this context 
as they allow to select a subset of regressors through a one-pass procedure. Also they can be used in 
conjunction with the methods of [LTTT14], where a sequence of hypothesis is generated by including 
an increasing number of regressors (e.g. sweeping values of the regularization parameter). 

To the best of our knowledge, the only procedure that compares with the ones we develop is 
the ForwardStop rule of [GWCT13], Note, however, that this approach falls short of addressing 
the issues we consider, for several reasons, (i) It is not online, at least in the form presented in 
[GWCT13] since it reject the first k null hypotheses, where k depends on all the p- values. ( ii ) It 
requires knowledge of all past p- values (not only discovery events) to compute the current score. (Hi) 
Since it is constrained to reject all hypotheses before k, and accept them after, it cannot achieve any 
discovery rate increasing with n, let alone nearly linear in n. For instance in the mixture model of 
Section 5, if the fraction of true non-null is e < a, then ForwardStop achieves 0(1) discoveries out 
of @(n) true non-null. In other words its power is of order 1/n in this simple case (no matter what 
is the strength of the signal for non-null hypotheses). 

1.3 Notations 

For two functions /(n) and g(n), the notation /(n) = £l(g(n)) means that / is bounded below by 
g asymptotically, namely, there exists positive constant C and no > 0 such that /(n) > Cg(n ) 
for n > no. The notation f(n) = 0(g(n)) indicates that / is bounded both above and below 
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by g asymptotically, i.e., for some C\,C '2 > 0 and some positive integer no we have C\g(n ) < 
f(n) < C 2 g{ri) for n > no. Throughout, f>(t) = e~ t2 / 2 /V2n, <F(i) = <p(t)dt. We also use 
x V y = max(x, y) and x Ay = ruin (an y). 


2 Main results 

We present two algorithms for online control of FDR and mFDR. The first algorithm sets the 
significance levels for tests based on total number of discoveries made so far. We name this algorithm 
Lond which stands for (significance) Levels based On Number of Discoveries. The second algorithm 
sets the significance level at each step based on the time the last discovery has occurred. We name 
this algorithm Lord which stands for (significance) Levels based On Recent Discovery. 


2.1 Lond algorithm 

We choose any sequence of nonnegative numbers (3 = (/%)“ i, such that /% = a. The values of 
significance levels ctj are chosen as follows: 


on = - 1 ) + 1 ) . 

Theorem 2.1. Suppose that conditional on D(i — 1), we have 


(7) 


ye G 0, ¥e i= o(Ti = 1| D(i - 1)) < E(ai| D(i - 1)), 


( 8 ) 


with Ti given by equation (2). Then rule (7) controls FDR and rnFDR at level less than or equal to 
a, i.e., for all n > 1, FDR(n) < a and mFDR(n) < a. 

Corollary 2.2. If the p-values are independent, then rule (7) controls FDR and mFDR at level less 
than or equal to a. 


Note that we do not require p-values to be independent, although it gives the simplest case where 
Condition (8) holds true. Moreover, in (8), we condition on the number of discoveries so far. This 
is in contrast to alpha-investing method [FS07] that uses information on acceptance of the previous 
hypotheses (Ti,..., X)_i) or adaptive testing in a batch sequential arrivals (see e.g. [LW99]) that 
exploits information on the observed ^-statistics. 


Remark 2.3. The update rule oti = /3i(D(i — 1) V1) also controls FDR, but leads to a slightly smaller 
number of discoveries. For this rule, the following holds true which is a more stringent control over 

mFDR. 


MV 9 {n)) ^ 

sup — . ^ . — -r < a 

06© E e (D(n) V 1) 


(9) 


2.2 Lord algorithm 

In the second algorithm the significance levels a* are adapted to the time of last discovery, rather 
than the number of discoveries so far. Concretely, choose any sequence of nonnegative numbers 
/3 = (/?i)“ 1 , such that Pi = a ■ We then set aj = /3j until a discovery occurs. If Hj is rejected, 
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then the sequence is renewed by choosing atj+k = /3fc, for k > 1, until we reach the next discovery. 
In other words, letting t* be the index of the most recent discovery before time step i , 


Ti = max < i, Hi is rejected j 


(with ri = 0) we set 


a t = . (10) 

In the next theorem we show that Lord controls mFDRi (n) for all n > 1 and also controls FDR at 
every discovery. 

Theorem 2.4. Suppose that conditional on Ti-we have 

VO e 0, Foi=o(Ti = l\n-i) <E(ai\n-i) . (ll) 

Then, the rule (10) controls rnFDR to be less than or equal to a, i.e., mFDRi (n) < a for all n> 1. 
Further, it controls FDR at every discovery. More specifically, letting t^ be the time ofk-th discovery, 
we have the following for all k > 1, 

supE6i(FDP(r fc )I(Tfc < oo)) < a. (12) 

e»ee 

Corollary 2.5. If the p-values are independent, then rule (10) controls FDR and mFDR at level 
less than or equal to a. 

Remark 2.6. For the update rule (10), the following holds true which is stronger than the control 
of mFDRi (n): 


E e (V e (n)) ^ 

S MD{n- 1) + 1) 

We refer to Section 7.2 for the proof of Theorem 2.4 and Remark 2.6. 


(13) 


2.3 Online control of FDR under dependency 

The BH procedure described in the introduction controls FDR when the p-values are independent. 
In [BY01], Benjamini and Yekutieli introduced a property called positive regression dependency from 
a subset Iq (PRDS on Jo) to capture the positive dependency structure among the test statistics. 
They relaxed the independence assumption on p-values by showing that if the joint distribution of 
the test statistics is PRDS on the subset of test statistics corresponding to true null hypotheses, then 
BH controls FDR. (See Theorem 1.3 in [BY01].) Further, they proved that BH controls FDR under 
general dependency if its threshold is adjusted by replacing a with a/ QT)™li i) in equation (1). Here, 
we prove an analogous result for LOND algorithm for online controlling of FDR. 

Theorem 2.7. //Lond is conducted with l j) P^ce of ^ in (7), it always controls 

FDR(n) at level less than or equal to a, for all n > 1, without requiring condition (8). 

Theorem 2.7 is proved in Section 7.3. 



3 Comparison with alpha-investing 

Alpha-investing was developed by Foster and Stine [FS07] to control rnFDR in an online multiple 
testing problem. The basic idea is to treat the significance level a as a budget to be spent over the 
sequence of tests and the rule earns an increment in its budget each time it rejects a hypothesis. 
More precisely, alpha-investing rules assume an initial budget W{ 0), and a rule for significance levels 
a* as a function of the form 

a j = fw(o)({ T h T 2 , ■ ■ - ,Ti- 1 }). (14) 

Let W(k) > 0 denote the budget after k tests. The outcomes of the tests change the available budget 
as follows: 

. . . f oj if Tj = 1 

W(j)-W(j-1) = { ’ 

I — aj /(I — aj) if Tj = 0 . 

Choosing uj = a and W (0) < ar /, alpha-investing rules control rnFDR^ under the condition that the 
budget stays nonnegative almost surely [FS07]. Note that since alpha-investing proceeds sequentially, 
it might stop the testing after some number of rejected hypotheses. 

It is worth noting that Lond and Lord are not a-investing rules. To show this, consider the 
case that all p-values are equal to one. Lond and Lord do not reject any of the hypotheses in this 
scenario. Hence, both of them set the significance levels aj = (3j for all j (cf. rule (7), (10)). The 
budget after n iteration works out at 

71 8 - 

W(n ) = W(0) - Y, YTJ-. ■ 

3=1 Pj 

Given that W(0) < a and Pj = a i the above budget can become negative for some value of n. 

This is not allowed in the alpha-investing method though. 

There is no general guarantee that alpha-investing controls FDR. Theorem 2 in [FS07] discusses a 
special case that the testing procedure stops after a deterministic number of rejections and assumes 
that the procedure has uniform control of mFDRo- In addition, it is proved to control mFDR 
under the assumption that Pg i= o(Tj = l|Ti,..., T)_i) < ct* for i > 1. In contrast, Lond and 
Lord procedures control FDR and mFDR. In Section 2.3 we further proposed an adjustment to 
Lond that addresses arbitrary dependency among the test statistics. Section 5 studies the rate of 
expected number of discoveries made by our procedures. We are not aware of any analogous analysis 
for alpha-investing rules. 

Let us finally mention that the recent work of Aharoni and Rosset [AR14] introduce a very broad 
class of online rules called generalized a-investing. We believe that our Lond algorithm fits within 
this framework. Note however that [AR14] -again- does not guarantee FDR control for generalized 
a-investing. (Instead it proves mFDR control, adapting the arguments of [FS07].) 

4 Discussion 

4.1 Effect of domain knowledge 

Lond algorithm sets the significance levels based on the number of discoveries made so far. Therefore 
there is an inherent positive effect from previous discoveries onto the next significance levels. In other 
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words, a larger number of discoveries leads to larger significance levels for future tests and hence 
more discoveries. This property of Lond is particularly beneficial when the researcher has knowledge 
about the underlying domain. In this case, she can order hypotheses such that those that are most 
likely to be rejected (i.e., truly non-null hypotheses) appear first. This yields a large number of 
discoveries at the very first steps whose effect is everlasting on the next significance levels. 

Likewise, Lord can also leverage from the domain knowledge. It renews the sequence of sig¬ 
nificance levels after each discovery. Hence, if the truly non-null hypotheses arrive in batches, 
Lord assigns a high significance level to them, namely /3i 1 , which yields an increased statistical 
power. 

In Section 6, we study the effect of domain knowledge via numerical simulations. 

4.2 Choosing sequence /3 

Both Lond and Lord algorithms involve a sequence (3 = (Pe)'^ 1 such that Pi = a ■ Examples 
of such sequence are: 

a C(a,v) 

A = —p~, 

= C(a, v) 

Pi Cog^Vl)’ 

where a, v > 1 and constants C, C are chosen in a way to ensure Pi = a - 

In case there is an upper bound n on the number of hypotheses to be tested, such information 
can be exploited by choosing C or C to satisfy YPe=\ Pi = a ■ This leads to larger values of and 
in turn larger significance levels for tests. Further, parameter v controls how fast the sequence f3 
decreases. If there is prior information about the arrival times of truly false hypotheses (e.g., batch 
patterns, size of the batches, ...), this information can be used in choosing proper value of v. 

5 Total discovery rate 

The proposed procedures control FDR and mFDR to be below a. However, apart from FDR the 
total discovery rate is another important characteristic of a testing procedure. For instance, the 
procedure that never rejects a hypothesis achieves zero FDR but is useless. 

In this section, we aim at comparing the total discovery rate for BH and our algorithms. We 
show that although our algorithms use only the outcomes of the previous tests in the stream, they 
have comparable performance to BH in terms of discovery rate. 

We focus on the mixture model wherein each null hypothesis Hi is false with probability e 
independently of other hypotheses. Further, the test statistics and hence the p-values are indepen¬ 
dent. Under null hypothesis Hi we have pi ~ ZY([0,1]) and under its alternative, pi is distributed 
according to a non-uniform distribution whose CDF is denoted by F. Clearly, in this model the 
p-values have the same marginal distribution. Let G(-) be the CDF of this marginal distribution, 
i.e., G(x) = (1 — e)x + sF(x). For clarity in presentation, we assume F(x), and hence G(x), is 
continuous. 


1 Except for possibly the first hypothesis in the batch. 
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5.1 Discovery rate of Lond 

In the next theorem we lower bound total discovery rate of Lond for a specific choice of sequence 
/3 = (ft)£ft ■ The goal here is to show that Lond has nearly linear number of discoveries, namely 
E(Z)Lond ^ R ^ _ 0(n 1_ “), for arbitrary 0 < a < 1 and all n > 1. 

Theorem 5.1. Consider the mixture model for the p-values with G(x ) = (1 — e)x + eF(x ) denoting 
the CDF of the marginal distribution. Assume that for some constants xq, A > 0 and n 6 (0,1), we 
have F(x) > \x K , for all x < xq. Choose sequence (/3) = (ft)£ft as ft = Ci~ u with 1 < v < 1/n and 
C set such that ft = a ■ We further let H LoND (n) denote the number of discoveries by applying 
Lond algorithm with sequence /3 to set of p-values (pi,P 2 , ■ ■ ■ ,Pn)- Then, there exists constant C 
such that for any fixed 0 < <5 < 1 and any n > 1, the following holds. 

p{ll L0ND (n) > (S)^} > l-<5. (15) 

We refer to Section 7.4 for the proof of Theorem 5.1. 

Equation(15) implies that 

E(D L0ND (n)) > (1-<S)(5)^. 

Note that since 1 < v < 1/n is arbitrary, the exponent (1 — nv)/(l — k) can be made arbitrarily close 
to 1. It is therefore clear form equation (15) that for any fixed 0 < a < 1 and all n > 1, we have 

E(T»Lond = _ 


5.2 Discovery rate of Lord 

As we show in the following theorem, Lord algorithm leads to a linear total discovery rate. 

Theorem 5.2. Consider the mixture model for the p-values with G(x) = (1 -£)x-\-eF(x) denoting the 
CDF of the marginal distribution. We further let D Lord (n) denote the number of discoveries by apply¬ 
ing Lord algorithm with sequence /3 to set of p-values {pi,P 2 , ■ ■ ■ ,Pn}- Then, linin^oo (_D Lond (n)/n) 
exists almost surely and 

lim — d Lonb (n) > A{G, j3) , 4(G,^)e fVe^« G(W ) . (16) 

n —>oc n — — \ J 

k =1 


Further, 

lim - E(1 A Lond (n)) > A{G, ft . (17) 

n—yoc Ti — 

Corollary 5.3. Suppose that there exists a sequence ft = (ft)^L 1 such that the following conditions 
hold true: 

1. W, ft G (0,1) and XXi ft = a ■ 

2. There exist constants Lq > 0 and c > 1 such that for l > L, we have G((5n) > eft. 
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Then, we have almost surely 


lim -D Lond 

n—>oo n 


ip) > (Lq + 


(c — 1 ) 1/0 


c— 1 


-1 


Similarly, 


lim -E (D LmD (n)) > (l 0 + 

n—>oo Tl V 


(c- 1)4 


c—1 
0 


-1 


(18) 


(19) 


In words, upon finding sequence (3 which satisfies conditions of Corollary 5.3, Lord gives linear 
discovery rate 

D Low (n) = 0(n). 


Example 1. Suppose that there exist constants A > 0, k E (0,1), and xo such that F{x) > Xx K , 
for all x < xq. Define 

_ Alog£ 

C * = 2^ TJT < 00 ■ 


t=\ 


Letting = (a log £) / we have Pi = a - Also, for any c > 1 we can choose Lq 

sufficiently large such that for £ > Lq 


cm > a - e)Pi + expt 

^ (a log t) K ^ c 

- c;l > t ' 

Therefore, both conditions in Corollary 5.3 are satisfied and thus Lord gives linear discovery rate. 


Example 2 (Mixture of Gaussians). Suppose that we are getting samples Zi ~ N(0j,l) and 
we want to test hypothesis Hi : 6i = 0 versus the alternative 0- L = //. In this case, two-sided p-values 
are given by 

Pi = 2(1 - $(N)) ■ 

Therefore, 


F{v) = P 0i=n{Pi < v) 

= P W*~ 1 (l-‘'/2)<N) 

= 2- $($ _1 (1 - v/2) + fi)~ $($ _1 (1 - v/2) - n). 
Let ( = <I> _1 (1 — v/2) and thus <h(— Q = v/2. Using this notation, we write 

F{v) = 2 - $(C + n) - <h(C - P) > ~ 0 • 

Recall the following classical bounds on the CDF of normal distribution for t > 0: 

f(i -i) <*(-.,< f. 


( 20 ) 


( 21 ) 
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Applying the above inequalities in equation (20), we obtain 


F(y) > *(-C) 


$(/*- 0 

H-C) 
y v (KC - v) C 
- 2 0(C) C-M 

c 


— ^ e MC-M 2 /2 


C-A* 


1 - 


1 - 


1 


(C - a 6 ) 2 

1 


Using the LHS bound in (21), it is easy to see that for small enough v, C > y/log(2/u). Therefore, 
for small enough v, we obtain 

F(u) > ^ exp(-/r 2 / 2 ) exp(^y / log(2/z/)). (22) 

Fix a > 1 and let 

oo i 

Choosing 

a _ _^_ 

C'*£log a (£ + 1) ’ 

clearly A = Furthermore, invoking equation (22) it is easy to see that for any c > 1 there 

exists Lq sufficiently large, such that the following holds true for all i > Lq: 


Gm > eF(fa) > j . 


Therefore, both conditions in Corollary 5.3 are satisfied and the expected number of discoveries 
would be 0(n). 


6 Numerical experiments 

In this section, we compare the performance of Lond and Lord algorithm with BH, Bonferroni and 
alpha-investing procedures in terms of FDR, rnFDR and statistical power. Our evaluations are on 
both synthetic and real data sets. 


6.1 Synthetic data 

6.1.1 Independent p- values 

We consider similar setup as in [FS07]. A set of hypotheses 7i(n) = (Hi, • • • , H n ) are tested where 
each hypothesis concerns mean of a normal distribution, Hj : 9j = 0. Parameters 0j are set according 
to a mixture model: 


9j ~ 



W.p. 1 — 7 T , 

w.p. 7r. 


(23) 
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In the first experiment, we set n = 1000 and a 2 = 21ogn. The test statistics are independent 
normal random variables Zj ~ N(#j, 1), for which the two-sided p-values work out at pj = 2<f>( — \Zj |). 2 
Given these p-values, we compute FDR and rnFDRi for signihcance level a = 0.05 by averaging 
over 10,000 trials of the sequence of test statistics. For LONDand Lord we choose the sequence 
/3 = { Pi )'^ =1 as follows: 


Pi 


C * 

£log 2 (£ V 2) ’ 


(24) 


with C* set in a way to ensure Pi = a - 

The Bonferroni procedure used in our experiments is the one that applies the following decision 
rule: 


T t 


1 if Pi < Pi (reject the null hypothesis Hi ), 
0 otherwise (accept the null hypothesis Hi). 


We consider an alpha-investing rule, as explained in Section 3, with the following rule in equa¬ 
tion (14): 


a,- = 


W{j) 

1 + 3 ~ T i ’ 


where t 3 denotes the time of the most recent discovery by time j. This rule was proposed by [FS07] 
to show how context dependent information can be incorporated into building alpha-investing rules. 
In case there is substantial side information that the first few hypotheses are likely to be rejected and 
that the truly non-null hypotheses appear in clusters, this rule exploits such information to increase 
power. The same rule has been used by [LFU11] in VIF regression algorithm designed for online 
feature selections. 

We evaluate performance of the five algorithms under the following two scenarios: 


• Scenario I: Absence of domain knowledge. In this scenario, nonzero means pi appear randomly 
in the stream of hypotheses as explained by equation (23). 

• Scenario II: Presence of domain knowledge. We assume that an investigator has knowledge 
about the underlying domain of research and she uses this information in choosing the hypothe¬ 
ses. She first tests the (null) hypotheses that are most likely to be rejected. To simulate this 
scenario, we sort the hypotheses in the stream according to the absolute values of means, i.e., 
|0j|, in decreasing order. Those with larger 1 9, appear earlier. Let us stress that the ordering 
is based on \0j\ and not the p- values. 

Figures 1(a) and 1(b) show FDR and rnFDR achieved by the five procedures and several values 
of n, the proportion of truly non-null hypotheses, under Scenario I. As we see, FDR and mFDR of 
LONDand Bonferroni are almost identical under this setting. In Figure 1(c), we show the relative 


2 Value of a controls the strength of the signal to be distinguished from noise. We set <r 2 = 21ogn because for truly 
null hypotheses (= 0) we have Zi ~ N (0,1), and maximum of m normal variables is w.h.p. at most y/2 log(n). 
Clearly, larger a leads to larger power and better control of FDR and mFDR. 
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FDR 





Figure 1: FDR and mFDR of Lord , Lond , BH, Bonferroni and an alpha-investing rule under Scenario I. 
Figure (c) shows the relative power of the procedures to BH method. 


power of the procedures with respect to BH method, under Scenario I. More precisely, letting Ug(n) = 
D(n) — V e (n) be the number of correctly rejected hypotheses by the procedure of interest, we estimate 


E 


( Mn) \ 

W R (n)J 


via averaging over 10, 000 trials of test statistics. Note that for large values of it, the relative power 
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mFDR 





Figure 2: FDR and mFDR of Lord , Lond , BH, Bonferroni and an alpha-investing rule under Scenario II. 
Figure (c) shows the relative power of the procedures to BH method. 


of alpha-investing drops substantially. The reason is that in this case, the rule rejects most of the 
hypotheses at the beginning and its budget, and ergo the significance level ag, increases rapidly. 
When ag gets close to one, any acceptance of a null hypothesis yields a large decrease in the budget 
and makes it negative. Therefore, the algorithm halts henceforth missing the next discoveries. 

Figure 2 exhibits the same metrics for the five procedures under Scenario II. In presence of domain 
knowledge, we see faster drop-off in FDR and mFDR of Lord and Lond procedures. Further, they 
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Total number of discoveries Total number of discoveries 




Figure 3: Total number of discoveries under setting described in Section 6.1.1 (Scenario I). 




(a) Lond (b) Lord 

Figure 4: Total number of discoveries under setting described in Section 6.1.1 (Scenario II). 


achieve higher relative power to BH compared to Scenario I, especially for small values of it. This 
supports our discussion in Section 4.1. 

In the second experiment, we compute the expected number of discoveries made by Lord and 
Lond under Scenarios I and II, for several values of n and it. The results are depicted in Figures 3 
and 4. As we see, for any fix ir , Lord and Lond exhibit a linear discovery rate in both scenarios. 
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Figure 5: FDR and mFDR of (adjusted) LONDand BH under setting described in Section 6.1.2 (Scenario I). 
Figure (c) shows the relative power of Lond to BH method. 


This corroborates our findings in Section 5 regarding the total discovery rate of these procedures. 

6.1.2 Dependent p-values 

We evaluate the performance of Lond algorithm in case the test statistics, and thus the p-values, 
are dependent. Here, we apply the adjustment to Lond , as described in Section 2.3, to address 
dependency. We follows the same setting as in Section 6.1.1, except that for each trial, the test 
statistics Z = (Z\,..., Z n ) are generated according to Z ~ N(0, £), where 9 = (9i,... ,9 n ) and 
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Figure 6: FDR and mFDR of (adjusted) LONDand BH under setting described in Section 6.1.2 (Scenario II). 
Figure (c) shows the relative power of Lond to BH method. 


£ £ M nxn is constructed as follows. Define 


We let 


1 if i = j, 
0.5 otherwise. 

£ = A£A, 
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where A is a diagonal matrix with uniformly random signs on the diagonal. Figure 5 shows the 
performance of Lond and (adjusted) BH 3 in controlling FDR and mFDR under Scenario I. 

Interestingly, in this setting the difference between criterion FDR and mFDR is more pronounced. 
As we see while the adjusted BH controls FDRto be below a = 0.05, the corresponding mFDR can be 
as high as 0.45. The (adjusted) Lond though controls both FDRand mFDRto be less than a = 0.05. 

Figure 6 shows the performance of Lond and BH under Scenario II. 

To the best of our knowledge, there is no alpha-investing rule that controls mFDR in presence of 
dependency among the p-values, and hence we do not include its evaluation in this experiment. 


6.2 Microarray example 

We illustrate performance of Lond algorithm on a microarray example wherein the genes to be 
tested arrive in the stream in an online fashion and we would like to find the significant ones while 
controlling FDR over infinite time horizon. 

We use a prostate cancer data set [SFR+02], which contains genetic expression levels for n = 12600 
genes on m = 102 men, m\ = 50 from normal controls and m 2 = 52 from prostate cancer patients. 
Each gene yields a two-sample t-statistic ti comparing tumor vs non-tumor cases as follows. Let Xij 
be the expression level of gene i on patient j. Further let Xi( 1) and Xi( 2) denote the average of x^ 
for the normal controls and for the cancer patients. The two-sample t-statistic for testing gene i is 
given by 

+ _ Xi( 2) - Xi( 1) 
ti — 


Here s* is an estimate of the variance of Xi( 2) — Xi( 1), 



1 

m — 2 


1 1 

-1- 

m\ m2 


){ ( Xij-x(l)) 2 + y (*« 

j’Enon-tumor jiEtumor 



The two-sided p-value for testing the null hypothesis Hi : “gene i is null” is then given by Pi = 
Fm- 2 (U), where F m -2 is the cumulative distribution function of a student’s t distribution with m — 2 
degrees of freedom. 

Since the gene expressions and thus the corresponding p-values are dependent in general, we apply 
adjusted BH and Lond , as described in Section 2.3. For significance level a = 0.05, BH returns 
459 genes as significant and Lond returns 203 significant ones. The significant genes suggested by 
Lond are a subset of those returned by BH. As we see, although Lond is designed to control FDR 
in an online manner, it recovers a good proportion of genes discovered by BH having access to all 
p-value a priori. 


3 Since the off-diagonal entries of S can be negative, it does not satisfy PRDS [BY01] and in general we need the 
adjustment described in Section 2.3 to BH to cope with dependency. 
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7 Proof of theorems and lemmas 


7.1 Proof of Theorem 2.1 

We first show that FDR(n) < a. Write 


FDR <"> - E ( = E (f = E (E 


n pQ 


D(n)\Jl) \^D(n)Vl/ V^ ctj D(n) V 1 
Clearly, ( D(n ) VI) > (D(j) V 1) and by the rule (7) we obtain 


CXi 


< 


a,' 


D(n) VI “ D(j - 1) + 1 


<Pi 


Condition (8) can be stated as 

Therefore, 


V0G0, Eg{F° - aj\D(j - 1)) <0. 


E 


(ipf) =e{e(^|d(j-i))} <E(1) 


= 1 


( 25 ) 


(26) 


(27) 


where we used the fact that a.j is deterministic given D(j — 1). Applying (26) and (27) to equation 
(25), we get 


OO 

FDR(n) <^2/3 j = a. 

3 = 1 

We next prove that mFDR(n) < a. Note that for all 0 6 0, we have 

E e (F° - at) = E(E , e {F? - ai \D{i - 1 ))) < 0 , 

where we used Condition (8) in the last step. Adding these inequalities for i = 1,..., n we get 
that the following holds true for all 6 € 0: 

n 

E e (V e (n))~Y^Mai)<0- (28) 

i =1 


Further, for all 9 G 0 we have 

n n n i—1 n 

E Efl(oi) = X] - 1) + 1 )} = J2 E D j) + E & 

i—1 i =1 1=1 j=1 i= 1 

n—1 n n 

= E( E v)MDj)+ Ea 

3 =1 i=j +1 *=1 

n 

< (E^){^(^( n )) + 1 }- 

Z— 1 

The result follows by combining equations (28) and (29). 


( 29 ) 
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7.2 Proof of Theorem 2.4 


We first prove the claim on controlling mFDR. Proceeding along the same lines as the proof of 
equation (28), we have the following for all 6 e 0: 

n 

E e (V e (n))-Y,M^i) <0. (30) 

2=1 

By equation (10), it is clear that for all 6 G 0, we have 

n 

aE e (D(n) + 1) - ^E^a*) > 0, (31) 

1=1 

because sum of significance levels a* between two consecutive discoveries is bounded by a. Combining 
equations (30) and (31), we obtain the desired result. 

We next prove equation (12). Define random variables Xj for * > 1 as follows: 



if Tj < oo and 1-th discovery is false, 
if Tj < oo and 1-th discovery is true, 
otherwise. 


(32) 


We first show that E(XjI(rj < oo)|rj_i) < a. This is in fact the probability that starting from 
Tj_i + 1 at least one discovery occurs and the first discovery is false. Therefore, by applying union 
bound we have 


E(XiI(Ti < oo)|7i_i) < ^2 F 9i=0 (D e = 1) <^2/3 r = a, (33) 

€=Ti_ i+l r= 1 

where the last inequality follows from the fact that under null hypothesis, 9( = 0, we have p( ~ 
U{[ 0,1]), and thus D? = 1 with probability at most ag. Moreover, ag = for t < r* due to 

rule (10). Hence, 


E(FDP(r fc )I(r fc < oo)) =E[^I(r k < oo) 

k 

= J_ E (Xi l(r k < oo)) 

2—1 

1 k 

< - ^E{E(XjI(rj < oo)|rj_i)| < a . 

2—1 

7.2.1 Proof of Remark 2.6 

Let Xi be defined as per equation (32). We have 

OO 

V e (n) = J2XiI(Ti<n). (34) 

2—1 
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Therefore 


OO 

E(V e (n)) = J2 E ( X i I (Ti<n)). 

i=l 

By applying union bound, we have 

n 

E(A 7 ' i I(rj < n)|rj_i) < ( ^ P, e t =o{D e = 1)) I(r*_i < n - 1) 

l=Ti- 1+1 

< a I(rj_i < n — 1). 


Hence, adopting the convention tq = 0, we get 

OO OO 

E (V e (n)) < a ^ P(ri_i < n — 1) = a ^ P(rj < n — 1) + a 

i = 1 2—1 

OO 

= aE^ I (ji < n — 1)^ + a 
2—1 

= aE(D(n — 1)) + a . 


( 35 ) 


(36) 


The result follows. 


7.3 Proof of Theorem 2.7 

Let £ v<u be the event that Lond makes v false and u true discoveries in TL(n). We further denote 
by no and n± the number of true and false null hypotheses in R(n). The FDR(n) is then 

n 0 ni 

FDR(n) = E(FDP(n)) = EEstsv - P(£,,„) • 

v=0 u =0 V ' 

We use the following lemma from [BY01] and state its proof here for the reader’s convenience. 
Lemma 7.1 ( [BY01]). The following holds true: 

^ n 0 

P(£u,«) = ~ y~] P((Pi < a) n £ vu ). 

v ZJ 

2=1 


Proof. Fix v and u and for a subset u G {1,..., no} with |w| = v, denote by £f u the event that the 
v false discoveries are oj. We further note that 


P {{.Pi < a i) ^ £y,u) = 



if i E uj , 
otherwise . 
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Therefore 


n 0 


n 0 


2=1 


^P(( K < ati) n£ VjU ) = EE P ^ < an) 

2=1 01 
n 0 

= ^^p(fe<« 8 ) n ^) 

01 2=1 
no 

") p (^«) = E^ p (0 = v¥ (£v, u ) , 


i=l 


which completes the proof. 

Applying Lemma 7.1 we obtain 


n 0 n i n 0 

FDR(n) = E E / T w/ i E P ((P* - ai ) n f ».«) • 
^o u =o(^“)Vl^ 


□ 


(37) 


For z > 1 and 0 < a < i — 1, we let the event cE-u be the event that if pi < at (i.e., H L is rejected) then 
there are u true discoveries, and v false discoveries in 72(n) such that s number of false discoveries 
occur before time step i. Clearly, 


P((Pi < «<) n £ v,u) = uEo F ((Pi < «*) n C sl,u) ■ 


(l) ... . 

Since the events Cs,v,u are disjoint for different values of s, we obtain 

no m 1 no i —1 


FDR(n) z= EE^EE P((pi < a*) n c^ v u ). 


V + u 

21=1 22=0 2=1 S =0 


(38) 


Rearranging the terms and recalling the rule ct* = j3i(D(i — 1) + 1), we arrive at 

no i—1 no ni 

U + U 


n o 2—1 no ni 

FDR(n) = E E E E ete p( ^ - ^ (s+ ^ n c w 


(39) 


2=1 S =0 21 = 1 22=0 


For s + 1 < k < n, define the event C\ i as 


s,k 


rW — I I cW 

L 's,fc — U '■'s,v,i 


21,22 

21+22=/c 

In words, tv*], is the event that there are k discoveries in T~L (n) with s false discoveries occurred before 
time i. Writing the RHS of equation (39) in terms of k instead of v and s, we have 

no i—1 n 

FDR(n) = EE E + i)]<>) 

i=l s=0 fc=s+l 


no i—1 


1 


sEEtPt E P(|p i <A(» + l)]nC«). 


s + 1 

i=l s=0 k=s +1 
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(i) 

We define CJ as the event that s false discoveries occur before time i. Equivalently, 

= U c fl ■ 

s-\-l<k<n 

(i) 

Given that events C s k are disjoint for different values of k, we get 


no i—1 


'“U " A -I 

FDR(n) < EZ-. — rF([pi < Pi(s + 1)] n c«). 

S 1 


2=1 S =0 


For j E {0,..., s}, denote 


Qi,j, s = Ip({p. < [Aj, A(j + l)]} n cW). 

Writing bound (41) in terms of we have 

no i—1 s 

FDR(n) < EEAtE®, 


I 1 / v '*h3i S 

S + 1 

2=1 s=0 j= 0 


no 2—1 2—1 


i 


no 2—1 ^ 2—1 


- X] s + I qi ’i’ s - ^ j + I X! 


*=1 1=0 s=l 
no i—1 


i=l 1=0 s=j 


Y X! wpy F w* - ^ &(•? + 1 


Z—" Z—^ j _)_ 

i=l 1=0 J 

n° i n 0 

= E(E )A^Eft 

i=l 1=1 17 i=l 

where in step (a), we used the fact p t ~ 7/([0,1]) under the null hypothesis Hi. 


= a. 


7.4 Proof of Theorem 5.1 

We let T£ denote the time of Eth discovery. Under the mixture model, for m > Tg + 1 we have 

m m 

P(t^ + i > m\Ti) = (l - G((£+ 1)A)) < exp | - Y G((£+1)A)}- 


(40) 


i=T£+l 


i=re+l 


Recall that G{x) = (1 — e)x + sF(x) and note that Tg > £. Hence, for some constant Lq and all 
i > Lq the following holds: 


G((£+l)Pi) >e\C K (t+l) K Y i~ KV 

i=T£ + l 2=7^ +1 

m 

>£\C K {£+l) K m~ KV Y 1 

i=r e +1 

= e\C K (£ + 1 ) K m~ KV (m -Tg- 1) 


(41) 


(42) 
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Combining equations (40) and (42) we get 


P(r£ + i > m\r £ ) < exp j — eXC K {£ + 1 ) K (m — r £ — l)m j . 
Next we compute E{t £+ \\t £ ). Since E(t £+ i > t £ + 1| t £ ) = 1, we have 


OO 

E(r^ + i| t £ ) = t £ + 1 + ^2 F ( r <?+1 > m l r <) 

m=r^+l 

OO 

<t £ -\- 2 + ^2 ex P | — eAC K (£ + l) K i(i + r £ + l) - *" j 

i =1 

We split the summation over 1 < * < t £ + 1 and t £ + 2 <i and upper bound each term separately. 


Tl + 1 

h = ^2 exp { 

i —1 

Tl + 1 

< ^ exp { 
1=1 


e\C K ^+lYi{i + T t + l)~ K ^ 
eXC K (£ + \yi{2 n + 2)"'“'} 


(a) rn+i , 

< / exp j — eXC K (£ + 1) k (2t £ + 2)~ KU z\dz 

< (2 t ( + 2 Y" 

- e\c*(e+\Y ’ 


(43) 


where (a) holds since the summand is decreasing in i. we next bound the second term as follows. 


OO 

I 2 = J2 exp{ -eXC K (£+l)H{i + T £ + l)-™} 

i=T£-\- 2 

OO 

< ^ exp | — eXC K (£ + l) K 2~ Kl 'i 1 ~ Kl ''^ 

i=T (+2 

(a) r°° r -s 

</ exp{ -e\C K {i+\) K 2- ,w z l - ,w \&z, (44) 

Jr e +1 L J 


where (a) holds since the summand is decreasing in i. Dehne C* = eXC K (£ + l) re 2 KV . It is straight¬ 
forward to see that 


exp ( - C^Adz = -^—C- 1/{1 - Kv) r(-^—) . 
J 0 V / 1 — KU VI — nv J 


Combining the bounds (43), (44) and (45), we obtain that for £ > Lq, 

(2 r e + 2 Y 


E(t £+ i\t £ ) <t £ + 2 + 


eXC K (£ + iy 


+ C'(£ + 1)-t^, 


(45) 


(46) 


with 


C’ = 


1 — KV 


— 1/(1 — KV) 

eXC K 2~ KV ) T 


1 

1 — KV 
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Here T(t) = z t ~ 1 e~ z dz is the gamma function. 

Let A £ = E(r^). Since f(s) = s KU is concave, by applying Jensen inequality to equation (46), we 
arrive at the following recursive bound for £ > Lq: 

A w 1 <A > + 2 + £ ^ ( ;^ + C'( < + 1 )^. (47) 

In Appendix A, we show that using equation (47) 

A (48) 

-—' —- -—- 1 — KU 

for some constant C = C(k,v). Let ( n = ( Sn/C) 1 ~ K . Applying Markov inequality, for any n and 
any fixed 0 < 5 < 1 , we obtain 

___ 1—K 

P{D Lond (n) < Cn} = P{r Cn >n}< X ^< = 6 , (49) 

n n 

which completes the proof. 


A Derivation of equation ( 48 ) 

Let C\ = 2 KU /(e\C k ) and C 2 = C' + 2 . We relax the bound (47) as 


, „, . r + 1 )" . r 
Am < A. <- (/ + 1|; +C 2 


We write A£ = 57 + where 


( 1 — \ 1 ~ Kl/ 1 -k 

Ci T —) (* + i)i=^_i. 

Writing equation (50) in terms of A^ and g( we get, for t > Lq, 

1 a ^ 1 a 1 r , {91 + A^ + 1 ) KV 

57 +1 + A<j +1 < 57 + A^ + Cl- ^ -1- C 2 

(q f + l)*" / A 0 \ KU 

= ft + A < + Ci T, t .L (1 + 


V a* + 1 / 


(£+l) K V 1 & + 1 

Since 0 < nv < 1, applying Bernoulli’s inequality, we obtain 

1 a ^ 1 a 1 n A , A^ 

57+1 + A^ + i < + A^ + C\ ( 1 + kia 


+ l) f 


57 + 


T ) + C 2 . 


Note that 


Ci 


^ + VZ = ( 1 ~ K " ) (£+l)T^ 

(£+l) K 1 VI -k) { } 

< {Ci^Y^) I ^{(i + 2 )^-(^ + 1 ) 5 ^} 


(50) 


(51) 


= 57+1 - 57 , 
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where the inequality holds since (1 — k)/(1 — nv) > 1. Using this bound in equation (51) gives the 
following 

( n 4-1 1 

A £+1 < A, + e lK u Kyt J A t + C 2 . 

{£ + 1) K 

Plugging in for g(. we arrive at 


Ae+i < + 


KV{\ — k) A( 

1 — KU i + 1 


+ c 2 , 


(52) 


for i > Lo- The claim follows readily applying the lemma below with /3 = kv and /j = (1 — /c)/(l — kv). 

Lemma A.l. Suppose that the sequence {A£}7L 1 satisfies the following inequalities for i > Lq with 
constants 0<{3<1, p>l and C > 0. 

^£+i < A.£ + ^ + C . (53) 

Then there exits constant Co > 0 such that A^ < CqP 1 for t > 1. 


Proof. Choose Co as follows 


Co 


{(Aiz 4i<*<Lo, (1 _^} 


max 


t( i-/5) 


(54) 


We prove the claim by induction on l. The induction basis l = Lo holds clearly by choice of Co- 
Suppose that the claim holds for £\ we prove it for t + 1. Using equation (53), we have 


A^ + i < At + fifa + C 


£ + 1 

< C 0 r + CofipL 




+ 1 


+ C. 


In order to prove the claim, it is sufficient to show that the RHS above is not larger than Cq{£ + 1)C 
Equivalently, 

(£ + IW + PuP + (C/Cq){£ + 1) - {£ + 1)^ +1 < 0 . 

Using inequality (fj, + 1)1^ < (£ + 1) M+1 — £^ +1 , we bound the RHS as follows 

{£ + 1)P + + ( C/C 0 )(£ + l)-(£ + ir +1 

< (1 + Pp)^ + (C/Co)(£ + i) - 4 + 1)^ 

= {P~ 1)4 M + {C/Cq){1 + 1) 

< {P - 1)^ + (2C/CoK < 0, 

where the last inequality follows from the choice of Cq as per (54). I_J 
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B Proof of Theorem 5.2 


Given the update rule (10), it is clear that the times between successive discoveries are i.i.d., under 
the mixture model. Therefore, the process of discoveries is a renewal process. Let fi = E(tj) be the 
mean inter-discovery time, where U = Ti — r ? ;_i. By the strong law of large numbers for renewal 
processes [DurlO], the following holds almost surely 

lim - D Lord (n) = - . 

n—>oo n [1 

We also have 

k 


Therefore, 

P (U > k) = (l - G(p e j) < . 

1=1 

OO OO 

H = E(t*) = Y P (U >k)<Y e ~ E " =1 > 

k= 1 k= 1 


which yields the desired result. Equation (17) follows by using the elementary renewal theorem [DurlO] 
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